Ensemble learning
bagging and boosting are the two major families of ensemble techniques. Most other methods (like stacking, blending, or random subspaces) can be thought of as variations or combinations of these core approaches.
1. Bagging (Bootstrap Aggregating):
- Key Idea: Reduce variance by training multiple models independently and averaging their outputs (for regression) or taking a majority vote (for classification).
- Characteristics:
- Uses bootstrapped datasets (sampling with replacement).
- Models are trained in parallel, so there's no dependency between them.
- Works best with models that have high variance (e.g., decision trees).
- Example:
- Random Forest: Combines decision trees by training each tree on a random subset of data and features, then aggregates predictions.
2. Boosting:
- Key Idea: Reduce bias by sequentially training models, where each model corrects the mistakes of its predecessor.
- Characteristics:
- Models are trained in sequence, with each one focusing on the errors of the previous models.
- Weights are updated to give more importance to misclassified examples.
- Works best with weak learners (e.g., shallow decision trees).
- Examples:
- AdaBoost: Increases weights on misclassified samples.
- Gradient Boosting: Optimizes a loss function by training models sequentially to minimize residuals.
- XGBoost, LightGBM, CatBoost: Variants of Gradient Boosting with improved speed and accuracy.
Comparison Between Bagging and Boosting:
Feature | Bagging | Boosting |
---|---|---|
Goal | Reduce variance | Reduce bias |
Training | Parallel (independent models) | Sequential (models dependent) |
Focus | Equal treatment of all samples | Focus on hard-to-predict samples |
Overfitting Risk | Lower | Higher if not tuned well |
Examples | Random Forest, Bagged Trees | AdaBoost, Gradient Boosting |
Why Some People Focus on Just Bagging and Boosting
- These two are the foundation of most ensemble methods.
- Stacking, blending, and others often combine aspects of bagging and boosting.